Exponential Language Models , Logistic Regression

نویسندگان

  • Can Cai
  • Ronald Rosenfeld
  • Larry Wasserman
چکیده

In this paper, we modify the traditional trigram model by using utterance-level semantic coherence features in an exponential model. The semantic coherence features are collected by measuring the correlations among content-word pairs occurring in sentences of two corpora , the real corpus and a corpus generated by the baseline trigram model. The measure we use for estimating the semantic association of content word pairs is Yule's Q statistic. For our preliminary analysis, we have further simpliied the modeling task by extracting a small set of statistics from each sentence-based Q statistics and applying them as features to the exponential model. We also simpliied the process of obtaining the MLE solutions of the exponential models by approximating it with a logistic regression model. We account for the uncertainty in the estimates of Q by constructing conndence intervals. The new model results in a slight reduction in test-set perplexity. We also discuss and compare alternative measures of associaztion, such as statistics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interactive Feature Induction and Logistic Regression for Whole Sentence Exponential Language Models

Whole sentence exponential language models directly model the probability of an entire sentence using arbitrary computable properties of that sentence. We present an interactive methodology for feature induction, and demonstrate it in the simple but common case of a trigram baseline, focusing on features that capture the linguistic notion of semantic coherence. We then show how parametric regre...

متن کامل

Interactive Feature Induction and Logistic Regression for Whole Sentence Exponential Language

Whole sentence exponential language models directly model the probability of an entire sentence using arbitrary computable properties of that sentence. We present an interactive methodology for feature induction, and demonstrate it in the simple but common case of a trigram baseline, focusing on features that capture the linguistic notion of semantic coherence. We then show how parametric regre...

متن کامل

Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis

  Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population.   Methods : In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were ...

متن کامل

برازش مدل الگوی رشد طبق آفتابگردان ارقام لاکومکا و پروگرس در شرایط دیم

Considering the importance of sunflower as one of the most important plants in the production of edible oils, present study was developed in order to determine the best nonlinear regression function which can quantify growth of diameter of sunflower head to time. At the present study in order to fit the best regression model explaining relationship between increasing of sunflower head diameter ...

متن کامل

A SEGMENTED REGRESSION MODEL FOR DESCRIPTION OF MICROBIAL GROWTH

A segmented regression model for the description of microbial growth has been suggested. The model is able to predict the exponential growth, logistic growth, logistic growth with a phase of decline, diauxic growth, microbial growth in synchronous cultures and the oscillatory growth

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000